Empirical performance evaluation of page segmentation algorithms

نویسندگان

  • Song Mao
  • Tapas Kanungo
چکیده

Document page segmentation is a crucial preprocessing step in Optical Character Recognition (OCR) system. While numerous segmentation algorithms have been proposed, there is relatively less literature on comparative evaluation | empirical or theoretical | of these algorithms. We use the following ve step methodology to quantitatively compare the performance of page segmentation algorithms: 1) First we create mutually exclusive training and test dataset with groundtruth, 2) we then select a meaningful and computable performance metric, 3) an optimization procedure is then used to automatically search for the optimal parameter values of the segmentation algorithms, 4) the segmentation algorithms are then evaluated on the test dataset, and nally 5) a statistical error analysis is performed to give the statistical signiicance of the experimental results. We apply this methodology to ve segmentation algorithms, three of which are representative research algorithms and the rest two are well-known commercial products. The three research algorithms evaluated are: Nagy's X-Y cut, O'Gorman's Docstrum and Kise's Voronoi-diagram-based algorithm. The two commercial products evaluated are: Caere Corporation's segmentation algorithm and ScanSoft Corporation's segmentation algorithm. The evaluations are conducted on 978 images from the University of Washington III dataset. It is found that the performance of the Voronoi-based, Docstrum and Caere's segmentation algorithms are not signiicantly diierent from each other, but they are signiicantly better than ScanSoft's segmentation algorithm, which in turn is signiicantly better than the performance of the X-Y cut algorithm. Furthermore, we see that the commercial segmentation algorithms and research segmentation algorithms have comparable performances.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Software Architecture of Pset: a Page Segmentation Evaluation Toolkit Software Architecture of Pset: a Page Segmentation Evaluation Toolkit

Empirical performance evaluation of page segmentation algorithms has become increasingly important due to the numerous algorithms that are being proposed each year. In order to choose between these algorithms for a speciic domain it is important to empirically evaluate their performance. To accomplish this task the document image analysis community needs i) standardized document image datasets ...

متن کامل

Segmentation Evaluation

Empirical performance evaluation of page segmentation algorithms has become increasingly important due to the numerous algorithms that are being proposed each year. In order to choose between these algorithms for a speciic domain it is important to empirically evaluate their performance. To accomplish this task the document image analysis community needs i) standardized document image datasets ...

متن کامل

A Methodology for Empirical Performance Evaluationof Page Segmentation AlgorithmsSong

Document page segmentation is a crucial preprocessing step in Optical Character Recognition (OCR) systems. While numerous page segmentation algorithms have been proposed , there is relatively less literature on comparative evaluation | empirical or theoretical | of these algorithms. For the existing performance evaluation methods, two crucial components are usually missing: 1) automatic trainin...

متن کامل

PSET: A Page Segmentation Evaluation Toolkit

Empirical performance evaluation of page segmentation algorithms has become increasingly important due to the numerous algorithms that are being proposed each year. In order to choose between these algorithms for a specific domain it is important to empirically evaluate their performance. To accomplish this task the document image analysis community needs i) standardized document image datasets...

متن کامل

Empirical Performance Evaluation Methodology and Its Application to Page Segmentation Algorithms

ÐWhile numerous page segmentation algorithms have been proposed in the literature, there is lack of comparative evaluationÐempirical or theoreticalÐof these algorithms. In the existing performance evaluation methods, two crucial components are usually missing: 1) automatic training of algorithms with free parameters and 2) statistical and error analysis of experimental results. In this paper, w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000